Skew-normal distribution#
The skew-normal distribution extends the Normal distribution by adding a shape parameter that controls asymmetry (skewness) while keeping Gaussian tails.
1) Title & classification#
Item |
Value |
|---|---|
Name |
Skew-normal ( |
Type |
Continuous |
Support |
\(x \in (-\infty,\infty)\) |
Parameters |
shape \(a\in\mathbb{R}\), location \(\xi\in\mathbb{R}\), scale \(\omega>0\) |
What you’ll be able to do after this notebook#
write the PDF/CDF and connect the model to the Normal distribution
interpret how the shape parameter changes skewness (and its limits)
derive the mean/variance and the log-likelihood
sample from a skew-normal using NumPy only
use
scipy.stats.skewnormfor evaluation, simulation, and fitting
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from scipy import stats
from scipy.special import ndtr, log_ndtr, owens_t
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(42)
2) Intuition & motivation#
A simple way to view the skew-normal is:
Start with a standard Normal density \(\varphi(z)\).
Multiply by a tilting factor \(\Phi(a z)\).
Because \(\Phi(a z)\) is close to 1 on one side of 0 and close to 0 on the other (depending on the sign of \(a\)), the product up-weights one side and down-weights the other, creating skewness.
Typical use cases:
Skewed noise / residuals in regression and time series (when tails are still roughly Gaussian).
Asymmetric measurement errors (e.g., one-sided effects, saturation effects).
Building block for mixtures (mixtures of skew-normals can approximate flexible shapes).
Relations to other distributions:
\(a=0\) gives an ordinary Normal: \(\text{SN}(\xi,\omega,0)=\mathcal{N}(\xi,\omega^2)\).
As \(a\to +\infty\), the standardized skew-normal approaches a half-normal on the right; as \(a\to -\infty\), it approaches a reflected half-normal.
It can be derived from a bivariate Normal via a hidden truncation / conditioning construction (used for sampling).
3) Formal definition#
Let \(X\sim\text{SN}(\xi,\omega,a)\) with location \(\xi\in\mathbb{R}\), scale \(\omega>0\), and shape \(a\in\mathbb{R}\). Define
Let \(\varphi\) and \(\Phi\) denote the standard Normal PDF and CDF:
PDF#
CDF#
A convenient closed form uses Owen’s \(T\) function \(T(h,a)\):
where
SciPy’s parameterization matches this notebook: scipy.stats.skewnorm(a, loc=xi, scale=omega).
def _as_float_array(x):
return np.asarray(x, dtype=float)
def skewnorm_logpdf(x, a, loc=0.0, scale=1.0):
'''Log-PDF of SkewNormal(loc, scale, a) on (-inf, inf).'''
x = _as_float_array(x)
a = float(a)
loc = float(loc)
scale = float(scale)
if not np.isfinite(a):
raise ValueError("shape a must be finite")
if scale <= 0:
raise ValueError("scale must be > 0")
z = (x - loc) / scale
# log f(x) = log 2 - log scale + log phi(z) + log Phi(a z)
return np.log(2.0) - np.log(scale) + stats.norm.logpdf(z) + log_ndtr(a * z)
def skewnorm_pdf(x, a, loc=0.0, scale=1.0):
return np.exp(skewnorm_logpdf(x, a, loc=loc, scale=scale))
def skewnorm_cdf(x, a, loc=0.0, scale=1.0):
'''CDF via Owen's T: F(x)=Phi(z) - 2*T(z,a).'''
x = _as_float_array(x)
a = float(a)
loc = float(loc)
scale = float(scale)
if not np.isfinite(a):
raise ValueError("shape a must be finite")
if scale <= 0:
raise ValueError("scale must be > 0")
z = (x - loc) / scale
return stats.norm.cdf(z) - 2.0 * owens_t(z, a)
# Quick consistency check against SciPy
xs = np.linspace(-3, 3, 7)
a_test, loc_test, scale_test = 3.0, -0.2, 1.4
pdf_close = np.allclose(
skewnorm_pdf(xs, a_test, loc=loc_test, scale=scale_test),
stats.skewnorm.pdf(xs, a_test, loc=loc_test, scale=scale_test),
rtol=1e-10,
atol=1e-12,
)
cdf_close = np.allclose(
skewnorm_cdf(xs, a_test, loc=loc_test, scale=scale_test),
stats.skewnorm.cdf(xs, a_test, loc=loc_test, scale=scale_test),
rtol=1e-10,
atol=1e-12,
)
pdf_close, cdf_close
(True, True)
4) Moments & properties#
A helpful re-parameterization is
For \(X\sim\text{SN}(\xi,\omega,a)\):
Quantity |
Value |
|---|---|
Mean |
\(\mathbb{E}[X]=\xi + \omega\,\delta\,\sqrt{\tfrac{2}{\pi}}\) |
Variance |
\(\mathrm{Var}(X)=\omega^2\left(1-\tfrac{2\delta^2}{\pi}\right)\) |
Skewness |
\(\gamma_1=\dfrac{\tfrac{4-\pi}{2}\left(\delta\sqrt{\tfrac{2}{\pi}}\right)^3}{\left(1-\tfrac{2\delta^2}{\pi}\right)^{3/2}}\) |
Excess kurtosis |
\(\gamma_2=\dfrac{2(\pi-3)\left(\delta\sqrt{\tfrac{2}{\pi}}\right)^4}{\left(1-\tfrac{2\delta^2}{\pi}\right)^2}\) |
MGF |
\(M_X(t)=2\exp\left(\xi t+\tfrac{1}{2}\omega^2 t^2\right)\,\Phi(\delta\omega t)\) for all real \(t\) |
CF |
\(\varphi_X(t)=2\exp\left(i\xi t-\tfrac{1}{2}\omega^2 t^2\right)\,\Phi(i\delta\omega t)\) |
Entropy |
no simple closed form; \(H(X)=\tfrac{1}{2}\log(2\pi e\,\omega^2) - \mathbb{E}[\log(2\Phi(aZ))]\) with \(Z\sim\text{SN}(0,1,a)\) |
Useful properties:
Location/scale: if \(Z\sim\text{SN}(0,1,a)\) then \(X=\xi+\omega Z\sim\text{SN}(\xi,\omega,a)\).
Symmetry flip: if \(X\sim\text{SN}(\xi,\omega,a)\) then \(-X\sim\text{SN}(-\xi,\omega,-a)\).
Bounded skewness: the skewness achievable by a skew-normal is bounded (it cannot represent arbitrarily extreme skew).
def _delta(a):
a = float(a)
return a / np.sqrt(1.0 + a * a)
def skewnorm_mean(a, loc=0.0, scale=1.0):
d = _delta(a)
return float(loc) + float(scale) * d * np.sqrt(2.0 / np.pi)
def skewnorm_var(a, scale=1.0):
d = _delta(a)
scale = float(scale)
if scale <= 0:
raise ValueError("scale must be > 0")
return (scale**2) * (1.0 - 2.0 * d * d / np.pi)
def skewnorm_skewness(a):
d = _delta(a)
c = d * np.sqrt(2.0 / np.pi)
return ((4.0 - np.pi) / 2.0) * (c**3) / ((1.0 - 2.0 * d * d / np.pi) ** 1.5)
def skewnorm_excess_kurtosis(a):
d = _delta(a)
c = d * np.sqrt(2.0 / np.pi)
return 2.0 * (np.pi - 3.0) * (c**4) / ((1.0 - 2.0 * d * d / np.pi) ** 2)
def skewnorm_mgf(t, a, loc=0.0, scale=1.0):
'''MGF for real t.'''
t = _as_float_array(t)
a = float(a)
loc = float(loc)
scale = float(scale)
if scale <= 0:
raise ValueError("scale must be > 0")
d = _delta(a)
return 2.0 * np.exp(loc * t + 0.5 * (scale * t) ** 2) * ndtr(d * scale * t)
def skewnorm_cf(t, a, loc=0.0, scale=1.0):
'''Characteristic function (uses complex Normal CDF via scipy.special.ndtr).'''
t = _as_float_array(t)
a = float(a)
loc = float(loc)
scale = float(scale)
if scale <= 0:
raise ValueError("scale must be > 0")
d = _delta(a)
return 2.0 * np.exp(1j * loc * t - 0.5 * (scale * t) ** 2) * ndtr(1j * d * scale * t)
# Moment sanity check
a_demo, loc_demo, scale_demo = 4.0, 0.5, 1.2
mean_theory = skewnorm_mean(a_demo, loc=loc_demo, scale=scale_demo)
var_theory = skewnorm_var(a_demo, scale=scale_demo)
skew_theory = skewnorm_skewness(a_demo)
exkurt_theory = skewnorm_excess_kurtosis(a_demo)
mean_scipy, var_scipy, skew_scipy, exkurt_scipy = stats.skewnorm.stats(
a_demo, loc=loc_demo, scale=scale_demo, moments="mvsk"
)
(mean_theory, mean_scipy), (var_theory, var_scipy), (skew_theory, skew_scipy), (exkurt_theory, exkurt_scipy)
((1.4288740671735822, 1.4288740671735822),
(0.5771929673324073, 0.5771929673324073),
(0.7844267553823129, 0.7844267553823128),
(0.6327847548211796, 0.6327847548211796))
5) Parameter interpretation#
Shape \(a\) (skewness control)#
\(a=0\) gives a symmetric Normal distribution.
\(a>0\) produces right-skew (more mass to the right).
\(a<0\) produces left-skew.
As \(|a|\to\infty\), the standardized distribution tends to a (signed) half-normal.
A convenient mapping is \(\delta = a/\sqrt{1+a^2}\), which compresses \(a\in\mathbb{R}\) into \(\delta\in(-1,1)\). Many formulas depend on \(\delta\).
Location \(\xi\) and scale \(\omega\)#
\(\xi\) shifts the distribution left/right.
\(\omega\) scales it.
Important: when \(a\ne 0\), loc and scale are not the mean and standard deviation. Use the moment formulas from Section 4.
# PDF: changing shape a (keep loc=0, scale=1)
x = np.linspace(-4, 4, 800)
a_values = [-8, -3, 0, 3, 8]
fig = go.Figure()
for a in a_values:
fig.add_trace(go.Scatter(x=x, y=skewnorm_pdf(x, a), mode="lines", name=f"a={a}"))
fig.update_layout(
title="Skew-normal PDF for different shape values (loc=0, scale=1)",
xaxis_title="x",
yaxis_title="density",
)
fig.show()
# Location/scale effects (fix shape)
a_fixed = 5.0
x = np.linspace(-6, 8, 900)
fig = go.Figure()
for loc, scale in [(0.0, 1.0), (1.0, 1.0), (0.0, 2.0)]:
fig.add_trace(
go.Scatter(
x=x,
y=skewnorm_pdf(x, a_fixed, loc=loc, scale=scale),
mode="lines",
name=f"loc={loc}, scale={scale}",
)
)
fig.update_layout(
title=f"Skew-normal PDF with fixed shape a={a_fixed:g}",
xaxis_title="x",
yaxis_title="density",
)
fig.show()
6) Derivations#
A standard construction for \(Z\sim\text{SN}(0,1,a)\) uses two independent standard Normals:
Draw \(U,V\stackrel{\text{iid}}{\sim}\mathcal{N}(0,1)\).
Let \(\delta = a/\sqrt{1+a^2}\) and define
This produces a skew-normal variable with shape \(a\). It makes moment calculations easy.
Expectation#
Because \(U\) and \(V\) are independent, \(\mathbb{E}[V]=0\), and \(\mathbb{E}|U|=\sqrt{2/\pi}\),
For \(X=\xi+\omega Z\), we get \(\mathbb{E}[X]=\xi+\omega\delta\sqrt{2/\pi}\).
Variance#
Using \(\mathrm{Var}(|U|)=1-2/\pi\) and \(\mathrm{Var}(V)=1\):
So \(\mathrm{Var}(X)=\omega^2\left(1-\tfrac{2\delta^2}{\pi}\right)\).
Likelihood#
For observations \(x_1,\dots,x_n\) and parameters \((\xi,\omega,a)\), let \(z_i=(x_i-\xi)/\omega\). The log-likelihood is
Numerically, the \(\log\Phi(\cdot)\) term should be computed with a stable logcdf/log_ndtr to avoid underflow.
def skewnorm_loglik(x, a, loc=0.0, scale=1.0):
x = _as_float_array(x)
return float(np.sum(skewnorm_logpdf(x, a, loc=loc, scale=scale)))
# Example log-likelihood value
x_demo = np.array([-1.2, -0.3, 0.1, 0.7, 2.0])
skewnorm_loglik(x_demo, a=2.5, loc=0.0, scale=1.0)
-12.78999919486678
7) Sampling & simulation (NumPy-only)#
The construction from Section 6 gives a simple NumPy-only sampler.
Algorithm for \(X\sim\text{SN}(\xi,\omega,a)\):
Compute \(\delta = a/\sqrt{1+a^2}\).
Draw \(U,V\sim\mathcal{N}(0,1)\) independently.
Set \(Z = \delta |U| + \sqrt{1-\delta^2}\,V\).
Return \(X = \xi + \omega Z\).
This is fast and vectorizes well.
def skewnorm_rvs_numpy(a, loc=0.0, scale=1.0, size=1, rng=None):
'''Sample from SkewNormal(a, loc, scale) using NumPy only.'''
if rng is None:
rng = np.random.default_rng()
a = float(a)
loc = float(loc)
scale = float(scale)
if not np.isfinite(a):
raise ValueError("shape a must be finite")
if scale <= 0:
raise ValueError("scale must be > 0")
size_tuple = (size,) if isinstance(size, int) else tuple(size)
d = _delta(a)
u = rng.normal(size=size_tuple)
v = rng.normal(size=size_tuple)
z = d * np.abs(u) + np.sqrt(1.0 - d * d) * v
return loc + scale * z
def skewnorm_entropy_mc(a, loc=0.0, scale=1.0, n=200_000, rng=None):
'''Monte Carlo estimate of differential entropy H(X) = -E[log f(X)].'''
if rng is None:
rng = np.random.default_rng()
x = skewnorm_rvs_numpy(a, loc=loc, scale=scale, size=n, rng=rng)
return float(-np.mean(skewnorm_logpdf(x, a, loc=loc, scale=scale)))
# Smoke test: sample moments close to theory
a_test, loc_test, scale_test = 5.0, -0.2, 1.4
s = skewnorm_rvs_numpy(a_test, loc=loc_test, scale=scale_test, size=80_000, rng=rng)
sample_mean = s.mean()
sample_var = s.var()
theory_mean = skewnorm_mean(a_test, loc=loc_test, scale=scale_test)
theory_var = skewnorm_var(a_test, scale=scale_test)
(sample_mean, theory_mean), (sample_var, theory_var)
((0.8949220542394136, 0.8953462544575976),
(0.7695971251211239, 0.7602165828457118))
8) Visualization#
We’ll visualize:
the PDF for a chosen parameter setting
the CDF and an empirical CDF from Monte Carlo samples
a histogram vs theoretical PDF sanity check
def ecdf(samples):
x = np.sort(np.asarray(samples))
y = np.arange(1, x.size + 1) / x.size
return x, y
a_viz, loc_viz, scale_viz = 6.0, 0.0, 1.0
n_viz = 80_000
samples = skewnorm_rvs_numpy(a_viz, loc=loc_viz, scale=scale_viz, size=n_viz, rng=rng)
q_lo, q_hi = np.quantile(samples, [0.002, 0.998])
x_grid = np.linspace(float(q_lo), float(q_hi), 700)
# PDF + histogram
fig = go.Figure()
fig.add_trace(
go.Histogram(
x=samples,
nbinsx=80,
histnorm="probability density",
name="Monte Carlo samples",
opacity=0.55,
)
)
fig.add_trace(
go.Scatter(
x=x_grid,
y=skewnorm_pdf(x_grid, a_viz, loc=loc_viz, scale=scale_viz),
mode="lines",
name="Theoretical PDF",
line=dict(width=3),
)
)
fig.update_layout(
title=f"SkewNorm(a={a_viz:g}, loc={loc_viz:g}, scale={scale_viz:g}): histogram vs PDF",
xaxis_title="x",
yaxis_title="density",
bargap=0.02,
)
fig.show()
# CDF + empirical CDF
xs, ys = ecdf(samples)
fig = go.Figure()
fig.add_trace(
go.Scatter(
x=x_grid,
y=skewnorm_cdf(x_grid, a_viz, loc=loc_viz, scale=scale_viz),
mode="lines",
name="Theoretical CDF",
line=dict(width=3),
)
)
fig.add_trace(
go.Scatter(
x=xs[::120],
y=ys[::120],
mode="markers",
name="Empirical CDF (subsampled)",
marker=dict(size=5),
)
)
fig.update_layout(
title=f"SkewNorm(a={a_viz:g}, loc={loc_viz:g}, scale={scale_viz:g}): empirical CDF vs CDF",
xaxis_title="x",
yaxis_title="CDF",
)
fig.show()
9) SciPy integration (scipy.stats.skewnorm)#
SciPy provides a ready-to-use implementation:
stats.skewnorm.pdf(x, a, loc=..., scale=...)stats.skewnorm.cdf(x, a, loc=..., scale=...)stats.skewnorm.rvs(a, loc=..., scale=..., size=..., random_state=...)stats.skewnorm.fit(data)for MLE parameter fitting
You can also create a “frozen” distribution object: dist = stats.skewnorm(a, loc=..., scale=...).
a_true, loc_true, scale_true = 4.0, -0.3, 1.7
dist = stats.skewnorm(a_true, loc=loc_true, scale=scale_true)
x = np.linspace(loc_true - 5 * scale_true, loc_true + 5 * scale_true, 500)
pdf = dist.pdf(x)
cdf = dist.cdf(x)
samples_scipy = dist.rvs(size=5000, random_state=rng)
# MLE fit
# (Note: MLE can be sensitive for some datasets; see Pitfalls.)
a_hat, loc_hat, scale_hat = stats.skewnorm.fit(samples_scipy)
(a_hat, loc_hat, scale_hat)
(4.218454607392225, -0.3213416599784268, 1.720966601172293)
10) Statistical use cases#
A) Hypothesis testing (is the data symmetric?)#
A common question is whether a Normal model is adequate. Since the Normal is the special case \(a=0\), we can test:
\(H_0: a=0\) (Normal)
\(H_1: a\ne 0\) (skew-normal)
A simple approach is a likelihood ratio test using MLEs. Under standard regularity conditions, the test statistic is asymptotically \(\chi^2_1\).
B) Bayesian modeling#
In Bayesian models, skew-normal likelihoods let you encode asymmetric noise while keeping light tails. There is no conjugacy, but posteriors are easy to sample (e.g., via MCMC) or approximate.
Below we show a lightweight grid posterior for the shape parameter \(a\) after standardizing the data.
C) Generative modeling#
Skew-normals are useful building blocks in generative models:
mixture models (flexible density estimation)
skewed latent variables (to model asymmetric factors)
skewed observation noise in simulators
def normal_loglik_mle(x):
x = _as_float_array(x)
mu_hat = float(np.mean(x))
sigma_hat = float(np.std(x, ddof=0)) # MLE uses ddof=0
return float(np.sum(stats.norm.logpdf(x, loc=mu_hat, scale=sigma_hat))), (mu_hat, sigma_hat)
def skewnorm_loglik_mle(x):
x = _as_float_array(x)
a_hat, loc_hat, scale_hat = stats.skewnorm.fit(x)
ll = float(np.sum(stats.skewnorm.logpdf(x, a_hat, loc=loc_hat, scale=scale_hat)))
return ll, (a_hat, loc_hat, scale_hat)
# Example: test on skewed data
x1 = stats.skewnorm.rvs(5.0, loc=0.0, scale=1.0, size=2000, random_state=rng)
ll0, params0 = normal_loglik_mle(x1)
ll1, params1 = skewnorm_loglik_mle(x1)
lr = 2.0 * (ll1 - ll0)
p_value = 1.0 - stats.chi2.cdf(lr, df=1)
params0, params1, lr, p_value
((0.7694839518527028, 0.5944520723662186),
(4.781273301264674, 0.022768496311229187, 0.9544398008289063),
288.62538440764183,
0.0)
# Grid posterior over a after standardizing (demo)
x = x1
z = (x - x.mean()) / x.std(ddof=0)
grid = np.linspace(-12, 12, 801)
loglik = np.array([skewnorm_loglik(z, a, loc=0.0, scale=1.0) for a in grid])
# Prior: a ~ Normal(0, 5^2)
logprior = stats.norm.logpdf(grid, loc=0.0, scale=5.0)
logpost = loglik + logprior
logpost -= np.max(logpost)
post = np.exp(logpost)
post /= np.trapz(post, grid)
# Posterior mean + 95% credible interval
post_mean = float(np.trapz(grid * post, grid))
dx = grid[1] - grid[0]
cdf = np.concatenate([[0.0], np.cumsum((post[:-1] + post[1:]) * 0.5) * dx])
cdf /= cdf[-1]
ci_low = float(np.interp(0.025, cdf, grid))
ci_high = float(np.interp(0.975, cdf, grid))
post_mean, (ci_low, ci_high)
(0.00011139849742594632, (-0.05972705368089612, 0.05989457000981375))
fig = go.Figure()
fig.add_trace(go.Scatter(x=grid, y=post, mode="lines", name="posterior p(a|data)"))
fig.update_layout(
title="Grid posterior for shape a (standardized data; Normal(0,5^2) prior)",
xaxis_title="a",
yaxis_title="density",
)
fig.show()
# Generative example: linear model with skewed noise
n = 1500
x = rng.uniform(-2, 2, size=n)
noise = skewnorm_rvs_numpy(a=6.0, loc=0.0, scale=0.6, size=n, rng=rng)
y = 1.5 * x + noise
fig = px.scatter(x=x, y=y, opacity=0.35, title="y = 1.5x + skew-normal noise")
fig.update_layout(xaxis_title="x", yaxis_title="y")
fig.show()
# Residual distribution (should be skewed)
resid = y - 1.5 * x
fig = px.histogram(resid, nbins=60, histnorm="probability density", title="Residuals")
fig.update_layout(xaxis_title="residual", yaxis_title="density")
fig.show()
11) Pitfalls#
Parameter constraints:
scalemust be strictly positive;ashould be finite.Interpretation: when \(a\ne 0\),
locandscaleare not mean/std.Skewness is bounded: a skew-normal cannot represent arbitrarily extreme skew.
Tail behavior: tails remain Gaussian; heavy-tailed skewed data often need a skew-\(t\) or other alternatives.
Numerical stability: the likelihood uses \(\log\Phi(az)\); for large negative \(az\), naive
np.log(stats.norm.cdf(...))underflows. Preferstats.norm.logcdforscipy.special.log_ndtr.Fitting: MLE (
stats.skewnorm.fit) can be sensitive, sometimes returning very large \(|a|\) when data look close to half-normal. Always check diagnostics and consider multiple initializations if needed.
12) Summary#
skewnormis a continuous distribution on \(\mathbb{R}\) that adds a shape parameter to the Normal to model asymmetry.The PDF is \(\tfrac{2}{\omega}\varphi(z)\Phi(az)\); the CDF can be written with Owen’s \(T\).
Mean/variance are available in closed form via \(\delta=a/\sqrt{1+a^2}\), but entropy generally needs numerical evaluation.
A simple NumPy-only sampler comes from the representation \(Z=\delta|U|+\sqrt{1-\delta^2}V\).
In practice,
scipy.stats.skewnormcovers evaluation, sampling, and MLE fitting.